How it works
- Open an agent and go to Safety and Evaluations > Simulation Engine.
- Define or auto-generate scenarios: situations the agent should handle. Examples: “angry customer demanding a refund,” “user asking an out-of-scope question.”
- Define or auto-generate personas: user types the agent will encounter. Examples: “non-technical user,” “enterprise decision-maker,” “hostile adversarial user.”
- The engine combines scenarios and personas into test cases automatically.
- Run the simulation. The engine executes each test case and scores the results.
Scoring metrics
| Metric | What it measures |
|---|---|
| Task Completion | Did the agent accomplish what the user asked? |
| Hallucination | Did the agent fabricate facts not present in its knowledge? |
| Faithfulness | Is the response grounded in the connected Knowledge Base? |
| Toxicity | Did the agent produce harmful content? |
| Bias | Did the agent treat any group unfairly? |
| Tool Accuracy | Did the agent call the right tool with the correct arguments? |